A new method for customized fetal growth reference percentiles

Background Customized fetal growth charts assume birthweight at term to be normally distributed across the population with a constant coefficient of variation at earlier gestational ages. Thus, standard deviation used for computing percentiles (e.g., 10th, 90th) is assumed to be proportional to the customized mean, although this assumption has never been formally tested. Methods In a secondary analysis of NICHD Fetal Growth Studies-Singletons (12 U.S. sites, 2009–2013) using longitudinal sonographic biometric data (n = 2288 pregnancies), we investigated the assumptions of normality and constant coefficient of variation by examining behavior of the mean and standard deviation, computed following the Gardosi method. We then created a more flexible model that customizes both mean and standard deviation using heteroscedastic regression and calculated customized percentiles directly using quantile regression, with an application in a separate study of 102, 012 deliveries, 37–41 weeks. Results Analysis of term optimal birthweight challenged assumptions of proportionality and that values were normally distributed: at different mean birthweight values, standard deviation did not change linearly with mean birthweight and the percentile computed with the normality assumption deviated from empirical percentiles. Composite neonatal morbidity and mortality rates in relation to birthweight < 10th were higher for heteroscedastic and quantile models (10.3% and 10.0%, respectively) than the Gardosi model (7.2%), although prediction performance was similar among all three (c-statistic 0.52–0.53). Conclusions Our findings question normality and constant coefficient of variation assumptions of the Gardosi customization method. A heteroscedastic model captures unstable variance in customization characteristics which may improve detection of abnormal growth percentiles. Trial registration ClinicalTrials.gov identifier: NCT00912132.


Methods
In a secondary analysis of NICHD Fetal Growth Studies-Singletons (12 U.S. sites, 2009-2013) using longitudinal sonographic biometric data (n = 2288 pregnancies), we investigated the assumptions of normality and constant coefficient of variation by examining behavior of the mean and standard deviation, computed following the Gardosi method. We then created a more flexible model that customizes both mean and standard deviation using heteroscedastic regression and calculated customized percentiles directly using quantile regression, with an application in a separate study of 102, 012 deliveries, 37-41 weeks.

Results
Analysis of term optimal birthweight challenged assumptions of proportionality and that values were normally distributed: at different mean birthweight values, standard deviation did not change linearly with mean birthweight and the percentile computed with the normality assumption deviated from empirical percentiles. Composite neonatal morbidity and mortality rates in relation to birthweight < 10 th were higher for heteroscedastic and quantile models (10.3% and 10.0%, respectively) than the Gardosi model (7.2%), although prediction performance was similar among all three (c-statistic 0.52-0.53).

Introduction
Fetal undergrowth as often characterized by fetal growth restriction (FGR) and small-for-gestational age (SGA) is associated with an increased risk of perinatal morbidity and mortality [1]. SGA is often defined as birthweight < 10 th percentile using a population based growth reference [2]. However, this approach identifies fetuses who are constitutionally small but otherwise healthy and misses fetuses who did not meet their growth potential but whose weight is at or above the 10 th percentile. In 1992, Gardosi et al proposed a customized method for birthweight references that took into account six pregnancy characteristics known to influence birthweight and thought to be physiologic, namely gestational age, maternal pre-pregnancy weight, height, race, parity, and fetal sex [3]. This method was further extrapolated from birthweight to estimated fetal weight during gestation by using fetal ultrasonographic biometric data and a commonly used fetal growth reference from Hadlock [4,5]. The percentiles for the ultrasound estimated fetal weight (EFW) curves (e.g., 10 th , 50 th and 90 th ) were proportionately adjusted upwards or downwards according to the Gardosi method's expected optimal birthweight at term for a given set of maternal and fetal characteristics. Customized fetal growth references are appealing as they provide a more personalized definition of FGR and SGA, in line with a precision medicine approach; however, whether their use improves the clinical detection of fetuses with suboptimal growth and at risk of morbidity and mortality is controversial [6][7][8]. Nevertheless, they have been recommended for use by national guidelines in some countries including Britain, Ireland and New Zealand [9]. A recent randomized trial did not demonstrate improved prenatal detection of SGA using the Growth Assessment Protocol based on customized fetal growth charts compared to standard care, although the negative results have been questioned because of lack of adherence to the intervention study arm and bleeding of some parts of the intervention in the "standard care" arm [10,11]. The primary metric of the Gardosi method is a customized term optimal birthweight (TOW) at 40 weeks which is then extrapolated to EFW at any gestational time using the proportionality model [12]. Based on the model and the proportionality assumption, the percentiles (e.g., 5 th , 10 th , 90 th , 95 th etc.) for the EFW are produced at all gestational ages between 24 and 42 weeks. However, the customized TOW percentiles are based on the assumptions that the distribution of birthweight is normal, and the standard deviation used for calculating the percentiles (e.g., 10 th , 90 th ), is proportional to the mean, i.e., the coefficient of variation (CV) is constant; these assumptions have never been formally tested yet have important clinical implications, because different percentile cutoffs will identify different proportions of fetuses as SGA versus non-SGA. This differential classification would potentially increase the risk of stillbirth in those pregnancies where SGA goes undetected or cause unnecessary iatrogenic earlier delivery in pregnancies where SGA is erroneously diagnosed.
This study was a secondary analysis of the NICHD Fetal Growth Studies-Singletons, a prospective pregnancy cohort study with the primary aim to establish fetal growth standards for size and velocity in the U.S. [13][14][15]. Our objectives were first, to evaluate the assumptions of the Gardosi customization model that the distribution of TOW around its customized mean value was normal and the standard deviation used for calculating the CV was proportional to the mean TOW. Second, we created a new customization method that has more flexibility in calculating customized percentiles using a heteroscedastic regression that customizes both mean TOW (and hence EFW by extrapolation) and standard deviation [16]. To be precise, the heteroscedastic model customizes a transformed value of the standard deviation but because that makes the standard deviation depend on the customizing factors, hereafter we refer to it as a model for customizing the standard deviation. Also, since clinical outcomes of SGA and LGA are essentially percentiles (e.g. 10 th , 90 th ), we further customized fetal growth using quantile regression, which directly calculates the percentiles without being reliant on the model for the mean and the assumption of normality [17]. We compared the performance of all three customization methods in relation to SGA and LGA birthweight with neonatal morbidity and mortality within the NICHD Fetal Growth Study and also in a concurrent analysis from the Consortium on Safe Labor because it has a larger number of births.

Study design and participants
The NICHD Fetal Growth Studies-Singletons recruited 2334 non-obese women (BMI 19�0-29�9 kg/m 2 ) from four different race/ethnic groups who were non-smokers and had low-risk medical and obstetrical histories (e.g., no chronic diseases) from 2009 to 2013 at 12 U.S. centers. Details of recruitment and study design have been previously reported [18]. An additional 468 women with BMI 30�0-44�9 kg/m 2 were recruited with similar inclusion criteria, although relaxed to allow certain chronic conditions (e.g. chronic hypertension controlled on medication), given the higher prevalence of concurrent morbidities with obesity [19]. Institutional review board approval was obtained at all participating sites as well as the NIH (IRB approval #09-CH-N152) on December 2009 prior to the study beginning. All participants provided written informed consent prior to data collection.

Procedures
Gestational age was based on a certain last menstrual period and confirmed by first trimester ultrasound [18]. At enrollment, information on demographics, obstetrical and medical histories, and lifestyle and health leading up to and during the first trimester of pregnancy was collected via in-person interview. After an enrollment sonogram at 10-13 weeks of gestation, women were randomly assigned to one of four ultrasound schedules for follow up visits at ranges 16-22, 24-29, 30-33, 34-37 and 38-41 weeks of gestation. For the assigned study visit, ± 1 week was allowed to accommodate women's availability. Sonographers for the study underwent uniform, centralized training and credentialing. A standardized protocol was used to obtain ultrasound measurements for fetal biometry including head circumference (HC), abdominal circumference (AC), and femur length (FL) at each study visit using identical, high-resolution ultrasound units at each center. The HC, AC, and FL were used to calculate EFW using a Hadlock formula [20]. Information on lifestyle, reproductive and medical history were obtained via in-person interviews at each research visit. Demographic data and antenatal, labor, delivery and neonatal course and outcomes were abstracted from the prenatal record, labor and delivery summary, hospital and neonatal records by trained research personnel. Paternal height and weight were by maternal report.
SGA and LGA associated with neonatal morbidities were defined similarly in a concurrent analysis of n = 102, 012 deliveries between 37-41 weeks from the Consortium on Safe Labor (CSL) [27]. Pregnancy, labor and delivery information was electronically abstracted from maternal records. Neonatal records included information on gestational age, NICU admission, medical conditions and discharge diagnoses. International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) codes were collected and linked to deliveries. Outcomes were defined to be consistent with previous CSL studies [28].

Statistical analysis
Demographic data were summarized as n (%) or mean (± SD). We developed a fetal growth percentile customization model using the Gardosi method [4]. Linear regression was used to predict birthweight at 40 weeks as the outcome, designated as the term optimal weight (TOW), using six customization variables: gestational age, maternal pre-pregnancy weight, height, race/ethnicity, parity, and infant sex. We then explored some of the assumptions of the Gardosi model, namely the assumption of normality and the constant CV assumption for the TOW distribution. If the Gardosi assumption of normality and constant CV were to hold, the percentiles computed based on Gardosi model should agree with the empirical percentiles across different levels of the mean birthweight. We stratified the estimated birthweight into eight contiguous intervals (depicting eight different values of mean birthweight) and investigated the agreement of the empirical percentiles with those obtained from the Gardosi model for each interval. To verify the assumption of constant CV, we looked at the relationship between the empirical standard deviation and the mean birthweight across the different birthweight intervals. As an extension to the Gardosi model which assumes that the standard deviation is proportional to the customized mean, we then created a model to customize both mean and standard deviation of the TOW using heteroscedastic regression with predicted birthweight at 40 weeks as the outcome and the same six customization variables [4]. The new customized mean and SD yielded customized values for the target percentiles using the quantile formula for normal distribution (S1 Table).
In a third customization model, we calculated customized percentiles directly using quantile regression with monotonic smoothing, a flexible model that does not assume a normal distribution [17]. Note that quantile regression customizes the target percentiles directly without using a 2-step model where first the customized mean and the customized standard deviation are obtained and then the percentiles are computed using the quantile formula for normal distribution. All three models included the same customizing variables containing cubic and quadratic terms of deviation of gestational time at delivery from the optimal 280 days mark a priori per the Gardosi model. In addition to the six proposed "physiological" variables that influence fetal growth, models also included "pathological" variables, smoking, BMI (kg/m 2) , and gestational diabetes, gestational hypertensive disease/preeclampsia, and antepartum bleeding. The analysis was centered on 280 days' gestation, height 163 cm, pre-pregnancy weight 64 kg, nulliparous, and Non-Hispanic White race/ethnicity. However, only the coefficients for the six "physiologic" variables (as designated by the Gardosi method) were included in an additive model to calculate the TOW percentiles [12]. The six variables were categorized similar to the Gardosi model with some slight alterations due to the availability of the data. Specifically, we included four race/ethnic groups (Asian, Hispanic, Non-Hispanic Black, Non-Hispanic White) instead of ethnic origin which was not available in our study. Parity 2 and greater (P2+) was combined into one group because of sparse data for higher parity whereas the Gardosi model includes each one separately: P0 (ref), P1, P2, P3, P > = 4. Standard goodness-of-fit and model diagnostics were performed.
The customization method of fetal growth based on the previously noted 6 maternal and fetal factors calculates the term optimal weight at 40 weeks which is then extrapolated back to ultrasound EFW across gestation using the Hadlock reference, proportionately adjusting the percentiles (e.g., 10 th , 50 th , 90 th ) upward or downward based on the profile. Therefore, to check cross-sectional consistency of the variance, the heteroscedastic model was executed a second time using EFW for pairs of weeks, i.e., 21-22, 22-23, etc. instead of extrapolating. Pairs of weeks were chosen because there were insufficient observations at each individual week.
Both the heteroscedastic regression model with separately customizable mean and standard deviation and the quantile regression model that explicitly produced customized percentiles were then compared to the Gardosi model [29]. Note that under the normality assumption in the Gardosi and heteroscedastic models, the mean is equal to the median value. We computed the 5 th , 10 th , 50 th , 90 th and 95 th percentiles for birthweight for deliveries at 37-41 weeks for a hypothetical mother whose customization factors were set to population average values in the NICHD Fetal Growth Studies-Singletons. The analysis was performed for each of the 3 models and the estimated percentiles were plotted for comparison. The equations to calculate the percentiles for the 3 models are presented in the Supplement. We also calculated the mean, median, SD, 10 th , and 90 th percentiles for the 3 models using EFW (instead of birthweight) at 38 and 39 weeks in the NICHD Fetal Growth Studies-Singletons.
We also compared the performance of the three customization models (Gardosi, heteroscedastic and quantile regression) and the Duryea birthweight reference in relation to SGA and LGA birthweight with neonatal morbidity and mortality [21]. Sensitivity, specificity, positive predictive values (PPV) and negative predictive values (NPV) were calculated for the association between each of the SGA and LGA classifications from the three customization models against the observed neonatal morbidity and mortality using multivariable logistic regression. Comparison of the performance of the customization models was first performed using the EFW at 38-39 weeks from the NICHD Fetal Growth Studies-Singletons. Analyses were then repeated using birthweight from the CSL study (because EFW was not available in the CSL). This step was for examining reproducibility and generalizability of the findings albeit using birthweight, since the CSL study included a much larger sample of deliveries on which out-ofsample prediction performance was tested. Moreover, the NICHD Fetal Growth Studies-Singletons targeted recruitment of low-risk pregnancies whose primary goal was developing a fetal growth standard, excluding pregnancies at higher risk for fetal growth abnormalities; the recruitment criterion for CSL did not have this restriction/limitation.

Results
Of the 2802 women recruited for the NICHD Fetal Growth Studies-Singletons, we excluded those who were deemed ineligible after enrollment, fetal anomalies, neonatal aneuploidy, deactivated (e.g., for pregnancy loss, moved, pregnancy termination, or lost to follow-up), delivered < 37 weeks, or had missing information, leaving 2288 for final analysis (S1 Fig). Study participants were racially/ethnically diverse with a mean maternal age of 28.2 (± 5.4) years; 46% were nulliparous, 56% had a BMI 18.5 to < 25 kg/m 2 , 26% had a BMI 25 to < 30 kg/m 2 and 16% a BMI 30.0 or greater kg/m 2 ( Table 1).

Evaluation of customization assumptions
In order to evaluate the assumptions of normality and constant CV we examined the data as follows. The data were sorted by the mean estimated TOW based on the Gardosi model and divided into eight contiguous equal length intervals, where each interval represents cases with a specific value of TOW (the mean birth weight value in the interval). The number of observations in each interval were not equal with fewer observations for the extreme intervals. However, there were substantial observations in each for the mean, the standard deviation and the percentiles to be estimated accurately. For each interval, we computed the empirical percentiles of birthweight and the standard deviation as well as the mean predicted TOW from the Gardosi model. We also computed the percentiles using the normality and the constant CV assumption from the Gardosi model. The results are presented in Figs 1 and 2.  Table 2 presents the results from the 3 models. As expected, the term optimal weight of 3510 g was similar for both Gardosi and heteroscedastic models since the mean would be the same as the median under the assumption of normality. However, in the quantile regression, the median term optimal weight was lower, 3487 g, challenging the assumption of normality.

Creation and comparison of three customization models
The beta-coefficients and standard errors for the mean characteristics in the heteroscedastic model and the Gardosi models were similar (Table 2). Interestingly, only the linear terms for maternal height and weight were statistically significant (in both models) but not the quadratic or cubic terms. However, we retained the quadratic and cubic terms in the model since they are included in the Gardosi model, and our main interest was to assess the variance terms. In the heteroscedastic model, only pre-pregnancy weight significantly affected the standard deviation (linear term β = 0.0145). Some of the other variables showed a potentially non-constant influence on the variability of TOW. Standard goodness-of-fit and model diagnostics indicated that overall, all 3 models appeared to fit well whereas the residuals did not show any appreciable departure.

Evaluation of model performance across gestation
The heteroscedastic model was executed again using EFW for pairs of weeks, i.e., 21-22, 22-23, etc. instead of birthweight to check the cross-sectional consistency of variance (i.e., whether the model assumptions hold at any unspecified point in gestation, not just at delivery with birthweight) (S2 Table). Though sporadic differences in variances were observed by maternal weight and height, no systematic dependence on any particular characteristic was found across gestation. These findings from the rolling weekly pair analysis indicate that there was no specific departure from the heteroscedastic customization model across gestation. Interestingly, however, the main effects of three of the six characteristics, maternal height, weight, and parity, in mean customization model for EFW were not consistent across gestational weeks. Maternal height was associated with increased EFW from around 28 to 30 weeks of gestation, and again around 33 weeks onward. Maternal weight also was associated with increased EFW from around 29 to 31 weeks and again around 33 weeks onward. Increasing parity was associated with increased EFW starting at the beginning of the third trimester around 28 weeks, although did not reach statistical significance until towards the end of pregnancy (not adjusted for multiple testing).

Evaluation of model performance across customization characteristics
Comparison of the 5 th , 10 th , 50 th , 90 th and 95 th percentiles among the three models were performed for the six characteristics to evaluate model performance. Analyses comparing birthweight for deliveries at 37-41 weeks in the NICHD Fetal Growth Studies-Singletons are presented in Fig 3 for illustration. The 50 th percentile was similar across all 3 models with the quantile regression percentile being only slightly lower than the other two. The percentiles for the Gardosi model were father apart than the other two models, meaning that there was a slightly lower birthweight for the 5 th and 10 th percentile cutpoints and slightly higher birthweight for the 90 th and 95 th percentile cutpoints than the heteroscedastic and quantile  regression models which were more aligned. In the heteroscedastic and quantile regression models, EFW 10 th and 90 th percentiles were also closer to one another than the Gardosi model across a range of maternal weights: 57kg, 64kg and 75kg for the 25 th , 50 th , and 75 th percentiles, respectively (Table 3). For example, EFW 10 th percentile at 37-38 weeks for a woman with a pre-pregnancy weight of 57 kg was 99g larger with customized variance (2571g heteroscedastic) and 131 g larger for quantile regression (2603 g) vs. Gardosi (2472 g), while EFWs at the 90 th percentile were 99 g and 26 g smaller, respectively.

Assessment of the effect of paternal characteristics on birthweight
Paternal height and weight were also independently associated with birthweight (S3 Table). In general, for each cm increase in paternal height from the average 177.8 cm, there was an approximately 3 g increase in EFW (4 g using quantile regression), compared to the 5 g increase (4 g increase for quantile regression) in EFW for each cm increase in maternal height from the average 163 cm when both were included in the model. For each 1 kg increase in paternal weight from the 81.6 kg average, there was also a 3 g increase in EFW compared to the 7 g increase in EFW for each kg increase in maternal weight from the average of 64 kg.

Summary of actual and predicted birthweight for the NICHD Fetal Growth Studies-Singletons
To evaluate comparative model performance, we calculated the median, 10 th , 90 th percentiles for birthweight in the NICHD Fetal Growth Study. The empiric (observational) mean birthweight (37-41 weeks) was 3371 g which was similar to the estimated term optimal birthweight of 3374 g for the Gardosi model and 3375 g for the heteroscedastic model, indicating that these models performed well at estimating observed mean birthweight. The estimated term optimal birthweight from the quantile regression was 3350 g, which was 22 g lower than the observed birthweight and expected since the quantile regression is modeling the median of the distribution rather than the mean. The standard deviations of the predicted birthweights were narrower for the customization models (248 g for Gardosi, 246 g for the heteroscedastic, and 234 g for the quantile regression) compared to 447 g for the observed birthweight. This difference can be explained because the extremes of the observed birthweight distribution are more widely dispersed than those of the predicted distributions. Such a phenomenon is not unexpected. Since the customization models use measures of central tendency (i.e., mean/median), the predicted distributions of birthweights are well-aligned with the observed distribution at the center of the data. The discrepancy in the percentiles between the observed distributions and the predicted distributions are more pronounced toward the tail of the distribution, with the 10 th and the 90 th percentiles differing by two to three hundred grams.

Neonatal morbidity prediction across three customization models
Finally, we applied the models to birthweight data at 37-41 weeks in the CSL and compared classification of SGA and LGA in relation to neonatal morbidity and mortality in the CSL (Table 4). While the composite neonatal morbidity and mortality rates in relation to SGA were higher for the heteroscedastic and quantile regression models (10.3% and 10.0%, respectively) than the Gardosi model (7.2%), the prediction performance was similar among the 3 customization models as well as the Duryea population-based birthweight reference (c-statistic 0.52-0.54) [21]. The pattern was similar for LGA (c-statistic 0.53 for all). Findings were similar in the NICHD Fetal Growth Studies-Singletons analysis for EFW at 38-39 weeks (S4 Table).

Discussion
We performed an in-depth examination of the statistical assumptions of the Gardosi customization method [4]. Our investigation indicates that the standard deviation varies differently than the mean birthweight across gestation for the six customization characteristics. These findings question the constant coefficient of variation assumption of the Gardosi customization model that the standard deviation, and therefore the customized percentiles, is proportional to the mean birthweight. Therefore, we created a model to simultaneously estimate both customized mean and standard deviation with heteroscedastic regression. Also, since our findings questioned the assumption that the data were normally distributed, we further investigated direct customization using a quantile regression model that does not assume normal distribution. While 50 th percentile EFW was similar across models, 10 th and 90 th percentiles Heteroscedastic 90 th Percentile-g 3901 3984 4094 Quantile regression 90 th Percentile-g 3932 4035 4164 Gardosi Term Optimal Weight-g 3447 3506 3579 Heteroscedastic Term Optimal Weight-g 3447 3506 3579 Quantile regression Term Optimal Weight-g 3424 3483 3549 Gardosi 10 th Percentile-g 2878 2927 2988 Heteroscedastic 10 th Percentile-g 2994 3028 3065 Quantile regression 10 th Percentile-g 3030 3067 3109 Note: All calculations were performed in SAS with non-rounded numbers. Proc GLM was used for the Gardosi based models, Proc AUTOREG was used for the heteroscedastic models and Proc QUANTREG was used for the quantile regression model. The percentiles were calculated per the equations in S1 Table. For example, the 90 th percentile from the Gardosi model was calculated as term optimal weight (TOW) = TOW + (1. for the Gardosi model were father apart, resulting in lower birthweight for 10 th percentile and higher for 90 th percentile cutpoints, than other two models. Composite neonatal morbidity and mortality rates in relation to birthweight < 10 th percentile was higher for the heteroscedastic and quantile regression models (10.3% and 10.0%, respectively) than the Gardosi model (7.2%), although prediction performance was similar among all three (c-statistic 0.52-0.53). Thus, while there was some departure from the assumptions of the Gardosi model, it still performed well in comparison to a more flexible heteroscedastic model. While quantile regression resolves the issue about assumption of normality, its similar performance in estimating the percentiles indicates that other two models may generally be robust with respect to the assumption of normality, at least for the study population considered, since the effect of nonnormality did not have an appreciable impact on model performance. In summary, the heteroscedastic model is equally straightforward to implement as the Gardosi model and has the advantage of being able to capture unstable variance in the customization characteristics if needed. The quantile regression model seems to be a natural choice for modeling quantiles when standard assumptions of normal distribution models are suspect. Quantile regression was used to create the WHO fetal growth charts and also for a customized fetal growth reference in an African-American population [30,31]. However, the price of the greater flexibility of the quantile regression is that it generally requires a greater sample size to yield accuracy as comparable to the linear regression models [17]. In the study by Kabiri et al., a customized fetal growth reference based on quantile regression did not improve prediction of perinatal morbidity compared with ultrasound references [30]. While in the present cohort the model also did not show significant improvement in terms of birthweight prediction, it is expected that as more data from controlled studies become available, the merits of flexible models compared to linear regression-based models could be better evaluated in the context of birthweight customization. Our investigation into the statistical assumptions of customization methods of proportional standard deviation across birthweight values is novel. In addition, the effect of the covariates on fetal growth across gestation had also been assumed to be fixed, but we found the effect of pre-pregnancy weight on EFW was both nonconstant and non-linear, and in the heteroscedastic model, maternal pre-pregnancy weight significantly affected the variance. Maternal height and parity were also associated with increased EFW starting at the beginning of the third trimester, with little influence in the first and second trimesters. Some of the other customization variables showed some non-constant influence on the distribution of EFW, although these findings were not statistically significant which could have been due to limited power. Also, the quadratic or cubic terms for maternal height and weight were not statistically significant in either the Gardosi or heteroscedastic model, indicating that a linear term may be sufficient. Removal of maternal weight from the customization model has previously been found to identify a greater proportion of LGA neonates with deliveries complicated by shoulder dystocia, NICU admission and neonatal respiratory problems that were not identified by a population based definition of LGA, although that analysis used the outcome of birthweight [32]. These findings indicate that the characteristics (i.e. maternal weight) and terms in the Gardosi customization model (i.e. quadratic and cubic) that are currently included may be unnecessary. In our analysis of EFW at 38-39 weeks, customization with the heteroscedastic model identified a slightly higher proportion of SGA neonates with morbidity (8.9%) compared to the Gardosi method (5.7%), with a similar pattern for LGA and SGA neonates < 5 th percentiles. Perhaps the ability of the heteroscedastic model to allow for unstable variance in the customization characteristics yielded a slight incremental improvement. Therefore, the heteroscedastic customization method has potential to identify more fetuses at risk of growth restriction and macrosomia, with associated improvement in targeting antenatal surveillance and obstetric intervention to reduce neonatal morbidity and stillbirths.
Paternal factors have not traditionally been included in customization charts. We found that increasing paternal height and weight had a positive, independent influence on fetal growth, although maternal height and weight had a stronger effect. These findings are similar to findings from the Generation R cohort of EFW in the Netherlands [33] and fetal biometric measurements in an Italian cohort [34] although another study from the UK also found maternal weight to have a stronger influence on birthweight, while maternal and paternal height had similar contributions [35]. The fact that maternal factors have a stronger influence on anthropometrics during fetal life compared to paternal factors has been hypothesized to be due to maternal preservation in conditions of constraint [36].
While the six customization characteristics (gestational age, maternal pre-pregnancy weight, height, race, parity, infant sex) are known to influence fetal growth, it is unclear whether the changes in fetal growth in relation to these characteristics are a normal physiologic adaptation or associated with increased risk for perinatal morbidity and mortality. Since shorter and lighter women would be expected to have smaller neonates than taller and heavier women, taking maternal height and weight into account should help identify fetuses that are more likely to be constitutionally small or overgrown instead of being erroneously labeled as not aligned with their growth potential [37]. While country (as a proxy for local ethnic mix) has been found to be the principal factor in predicting adverse outcomes in infants compared with customizing for additional individual characteristics, there is increased recognition that customizing for race/ethnicity might have unintended clinical consequences [38,39]. Birthweight is also known to increase with increasing parity until parity 4, with the largest increase between parity 0 and 1 (68 g on average) [40]. Male neonates weigh larger than females, an average 141 g larger at 40 weeks of gestation [21]. However, the influence of maternal short stature and nulliparity on perinatal mortality has been found to be mediated in part through SGA indicating that smaller EFW associated with maternal constraint is both physiological and pathological [41]. Finally, other factors can influence fetal growth, such as genetic and external factors, including altitude, diet and lifestyle, and other environmental conditions beyond the six factors included in the customization profile that are often routinely and easily obtained during the antenatal period [29,[42][43][44][45][46][47].
Our study only found incremental improvements in detection of rates of neonatal morbidity and mortality at term with SGA and LGA defined by all three customization models compared to population based birthweight reference, with no difference in predictive ability (i.e., similar c-statistics across the models) which may have been due to smaller numbers of adverse outcomes in a healthier population initially recruited for the primary study goals to create a fetal growth standard [18]. However, the ability to test the customized methods in the CSL, a large pregnancy cohort, with consistent results as our smaller ultrasound study strengthens our findings. A major strength of our study was the longitudinal collection of ultrasound fetal measurements which allowed us to evaluate the effect of the six customization characteristics across gestation, and also the ability to explore not only birthweight but EFW which is arguably more important clinically when considering obstetrical interventions such as antenatal monitoring and earlier delivery to prevent stillbirth and birth related complications.
The concept of considering maternal and fetal characteristics is appealing as a personalized medicine approach, although there is controversy on whether customization for maternal and fetal factors improves clinically useful detection of SGA and LGA [6,7]. Yet, the incremental improvement depends on several factors and the obstetric implications of customization have been understudied [8]. All three of the customization methods and the population-based birthweight reference had poor discrimination ability to predict neonatal morbidity and mortality indicating that we need to move beyond using a percentile cut-point to identify fetuses at risk even though this remains standard practice. Similarly, use of percentile cut-points to identify SGA and LGA is also ingrained in standard care, and customization is used in clinical practice [9]. We found that a customizing heteroscedastic model that allows for unstable variance in the customization characteristics may represent an incremental improvement over current customization methods in current use. Future work may consider additional maternal, fetal, and paternal factors and identify other factors related to neonatal morbidity and mortality. Randomized clinical trials are ultimately needed to compare whether and which customized chart is associated with reductions in short and long-term neonatal morbidity.   N = 2,288). (DOCX)